New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Tool to downsample a BAM while retaining reads in low coverage areas. #893

Merged

tfenne merged 3 commits into main from tf_downsample_and_norm

Jan 5, 2023

Member

tfenne commented Dec 29, 2022

Still needs tests.


          Tool to downsample a BAM while retaining reads in low coverage areas.

b91ea60

tfenne self-assigned this

codecov-commenter commented Dec 29, 2022 •

edited

Loading

Codecov Report

Base: 95.66% // Head: 95.65% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (78140b8) compared to base (da9ecbc).
Patch coverage: 94.64% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #893      +/-   ##
==========================================
- Coverage   95.66%   95.65%   -0.01%     
==========================================
  Files         125      126       +1     
  Lines        7239     7294      +55     
  Branches      507      487      -20     
==========================================
+ Hits         6925     6977      +52     
- Misses        314      317       +3

Flag	Coverage Δ
unittests	`95.65% <94.64%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ulcrumgenomics/bam/DownsampleAndNormalizeBam.scala	`93.33% <93.33%> (ø)`
src/main/scala/com/fulcrumgenomics/bam/Bams.scala	`96.40% <100.00%> (+0.27%)`	⬆️
...fulcrumgenomics/umi/ConsensusCallingIterator.scala	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

nh13 reviewed

View reviewed changes

Member

nh13 left a comment

Just a few comments, one potential bug.

src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala Outdated

+                  /** Returns the coverage at the given genomic position, or -1 if the position is not between start:end. */
+                  def apply(i: Int): Int = {
+                    if (i < start || i > end) -1

Member

nh13 Dec 29, 2022

Consider either returning 0 (zero coverage) or None (no coverage) instead of -1, which feels very Java-like.

Member Author

tfenne Jan 4, 2023

I think actually I can just go back to having it fail if the index is out of bounds.

src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala

+                    */
+                  def add(t: Template): Boolean = {
+                    val recs = t.allReads
+                      .filterNot(_.secondary)

Member

nh13 Dec 29, 2022

This made me think about supplementary reads. Should those be upgraded to "primary" alignments or be filtered out as well? I'm thinking of a region where we have a large number of both primary and supplementary alignments, and how to pick there.

Member Author

tfenne Jan 4, 2023

I definitely want to include supplementary reads, they're just part of a split-read alignment and therefore contribute to coverage. E.g. imagine an alignment to a circular genome where anything that maps over the break point will have a primary+supplementary, we want to count both of those.

src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala

+                          .sortBy(r => (r.refName, r.start, r.end))
+                          .iterator
+                          .map(r => new Interval(r.refName, r.start, r.end))
+                        new IntervalMergerIterator(iter, true, false, false)

Member

nh13 Dec 29, 2022

Sad we can't call arguments by name, since what are the three booleans here?

Member Author

tfenne Jan 4, 2023

Agreed. For reference they are:

combineAbutting
enforceSameStrand
concatenateNames

src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala Outdated Show resolved Hide resolved

src/main/scala/com/fulcrumgenomics/bam/DownsampleAndNormalizeBam.scala

+                      .toSeq
+                    val addsCoverage = recs.exists { rec =>
+                      detector.getOverlaps(rec.asSam).iterator().exists { cov =>

Member

nh13 Dec 29, 2022

Perhaps it's time to have getOverlaps return an iterator itself?

Member Author

tfenne Jan 4, 2023

I think perhaps it's time to move on from HTSJDK's OverlapDetector and implement one in scala that suits our needs better :/ But not today.

tfenne added 2 commits

January 4, 2023 03:13


          Move function to generate templates in random order into Bams and add…

45f2984

… test.


          Added a basic test for the tool

78140b8

nh13 approved these changes

View reviewed changes

tfenne marked this pull request as ready for review

January 5, 2023 16:07

tfenne merged commit 53c2ae9 into main

tfenne deleted the tf_downsample_and_norm branch

January 5, 2023 22:35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet